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ABSTRACT 



Context. This paper addresses a common problem in astronomy and cosmology: To optimally select a subset of targets from a larger catalog. 
A specific example is the selection of targets from an imaging survey for multi-object spectrographic follow-up. 

Aims. We present a new heuristic optimisation algorithm, HYBRID, for this purpose and undertake detailed studies of its performance. 
Methods. HYBRID combines elements of the simulated annealing, MCMC and particle-swarm methods and is particularly successful in cases 
where the survey landscape has multiple curvature or clustering scales. 

Results. HYBRID consistently outperforms the other methods, especially in high-dimensionality spaces with many extrema. This means many 
fewer simulations must be run to reach a given performance confidence level and implies very significant advantages in solving complex or 
computationally expensive optimisation problems. 

Conclusions. HYBRID outperforms both MCMC and SA in all cases including optimisation of high dimensional continuous surfaces indicating 
that HYBRID is useful far beyond the specific problem of optimal target selection. Future work will apply HYBRID to target selection for the 
new 10m Southern African Large Telescope in South Africa. 

Key words. Cosmology: observations - Catalogs - Surveys 



1. Introduction 

In many areas of life one is faced with the problem of allocating 
Umited resources to achieve maximal effect. In the case where 
this allocation takes the form of selecting a discrete subset of 
targets for further study we have the optimal target selection 
problem. As an example, the targets may be military or min- 
ing: given a large enemy fleet, which subset of targets should 
be attacked in order to inflict maximal damage given finite de- 
fensive capability? In the mining context, given geological and 
geographical information what are the optimal locations for 
new mine shafts/pits to be opened? Although our discussion 
will have an astronomy focus, the formalism we develop will 
be general. 

Classic examples of target selection already implemented 
were the spectroscopic 2df (Colless et al. 2001) and Sloan 
Digital Sky Survey (SDSS) galaxy surveys. In the case of the 
SDSS, two samples of galaxies were selected: the main sam- 
ple (Strauss et al. 2002) and the luminous red galaxy (LRG) 
(Eisenstein et al. 2001) sample. Both were selected with an em- 
phasis on uniformity, providing flux and volume limited sam- 
ples respectively. 



Next-generation Baryon Acoustic Oscillation (BAO) sur- 
veys such as KAOSAVFMOS (Bassett et al. 2006) will take 
spectra for over two million galaxies at redshifts ranging from 
z ~ 1 to z ~ 2.5 - 4 over areas exceeding 500 deg^. These 
target galaxies will have to be carefully selected from multi- 
colour imaging surveys and a key question will be the extent to 
which the survey will trade off generality for gain in addressing 
specific questions (e.g. the dynamics of dark energy) (Blake et 
al. 2006). 

Optimal target selection can have many facets. For ex- 
ample, cosmological errors often comprise two terms: cosmic 
variance and shot noise. The first pushes the survey to large 
areas and volumes while minimising short noise pushes the 
survey to smaller areas and higher densities (at constant to- 
tal survey time). Area can often be obtained "free of charge" 
by sparsely sampling the sky, i.e. with an inhomogeneous fibre 
density over the sky (Blake et al. 2006). Of course, this has to 
be folded into the actual realisation of the galaxy population 
and in general the optimal choice of targets must be done to- 
gether with the optimisation of the general parameters of the 
survey (Bassett 2005). 
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In this paper we will focus on a different target selection 
problem in which we want to choose targets in a relatively 
small number of fields of view (FoV) for maximal effect, as op- 
posed to for example selecting the optimal type of galaxy (e.g. 
blue or red sequence etc.). The problem we address is typ- 
ical for smaller surveys where one can 'cherry-pick' the best 
FoV for study. An example in this category is the gathering of 
pairs of Lyman-or spectra of quasars to obtain constraints on 
dark energy via the Alcock-Paczynski test, see e.g. McDonald, 
Miralda-Escude 1999. 

Section (|2]i discusses the general formalism we will be us- 
ing while sections (|3]l and (HI present the HYBRID algorithm 
and results respectively. In this paper we extensively use the 
acronyms FoM and FoV standing respectively for 'figure of 
merit' and 'field(s) of view'. 

2. General formalism 

As with all optimisation we need a Figure of Merit (FoM, also 
known as the utility, cost or objective function) which gives us 
an indication of the suitability/desireability of a given scenario. 
By maximising or minimising this FoM (as appropriate) we 
therefore select the best scenario for the problem at hand. To 
be concrete and without loss of generality, we will consider 
maximisation of the FoM as our goal. 

Optimal target selection typically depends on various inter- 
related issues: 

- The nature of the instrument that will be used to undertake 
the survey (e.g. the size of the telescope, the size of the field 
of view etc.). 

- The nature of the input target catalog (how many dimen- 
sions does it span?, how many objects does it contain? 
etc.) 

- The nature of the constraints (what is fixed: total time?, to- 
tal cost? etc.). 

A crucial, but somewhat hidden, role is played by the size 
and shape of the Field of View (FoV). In general it can have any 
shape and size but in this paper is always taken as circular of 
radius R. Two key dimensionless parameters which determine 
what method should be used are the ratios L/R and Ro/R, where 
L is the characteristic size of the input catalog (on the sky) and 
Rq is the characteristic clustering scale of the data on the sky. 

2.1. The discrete case 

Consider a large but discretely distributed input catalog of tar- 
gets, T. In astronomical applications a classic example is a col- 
lection of galaxies. Objects in T will differ in spatial position 
(x, y, z) or (RA, DEC, z). In addition, targets may carry ex- 
tra information, such as their colours (e.g. in the SDSS survey 
one has u, r, i, g, z colours), discrete information regarding type 
(QSO, LRG, spiral, elliptical etc.) and so on. In general ele- 
ments of T will be n-dimensional vectors. Some of the com- 
ponents of these vectors can change continuously while others 
are discrete. 



The most basic approach to optimal target selection is to 
maximise the following Figure of Merit (FoM) constructed by 
summing over the FoM of each field of view: 

FoM = _^ ||FoM, ||=_^w,; (1) 
'■ U 

where the sum over / denotes the sum over all fields of view 
(FoV) in the survey, the sum over j denotes objects in the rth 
FoV. Each object is given a weighting w,j which determines 
how useful it is to the overall survey aims. Since we do not want 
to double-count any objects this must be taken into account in 
the computation of the FoM (often optimal FoV may overlap to 
some extent). This requirement is denoted with || ■ ||. The FoM 
must be optimised subject to a constraint such as: 

2f,<f*. (2) 

Here f * could represent the total time available for the survey, 
in which case the total number of fields to be observed is not 
necessarily constant (some FoV may contain brighter targets 
than others). The constraint may also be of a simpler form such 
as the total number of FoV being fixed (the case we consider in 
this paper). 

The choice of weightings, will depend on the input cat- 
alog and aims of the survey. As an extreme example, galaxy 
surveys will give zero weight to stars in the same FoV, indeed 
stars may even be given negative Wjj to discourage looking 
through the galactic plane. A more interesting example is that 
Wij may be chosen to implement hard or soft colour or redshift 
cuts or be designed to minimise selection or other biases. 

In our discrete examples below, we have chosen 2- 
dimensional target catalogs, but the ideas scale trivially to 
higher-dimensional catalogs. Note that in general, T can have 
both a continuous and discrete dependence. 

2.2. The continuous case 

In general, the input catalog may effectively be continuous. For 
example, we may have an effectively continuous map such as a 
CMB or X-ray map rather than a discrete set of sources. While 
all maps are essentially discrete at the pixelisation level of the 
detector, there may be a huge number of pixels in the field of 
view or one may simply be dealing with a time-ordered stream 
of data. 

In this case, it is more appropriate to define the FoM to be 
FoM = X (/ ^^'^^'^'^j 

where f2, defines the interior of the ith FoV (and ensures the 
same region is not double counted) and 6, (p parametrise posi- 
tion on the sky. The weight function W is now a continuous 
function of angle on the sky. Again the FoM would be opti- 
mised subject to a constraint, typically that the total amount of 
survey time is fixed. 

A relevant example is given by spectroscopic followup 
of clusters discovered using cosmic microwave background 
maps sensitive to the Sunyaev-Zel'dovich effect, such as will 
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be the case for ACT (http://www.hep.upenn.edu/act/), SPT 
( |http://spt.uchicago.edu/ 1 and similar surveys. In this case, 
W(9, (p) could be chosen to be the temperature decrement or 
some other useful quantity measuring the significance of the 
detection or mass of the cluster 

2.3. Cross-correlating multiple data sets 

Optimal target selection comes into its own when it is used 
on multiple data sets (where human abilities start to fade). For 
example, given a limited spectroscopic subsample of bright 
galaxies, one may combine it with deep photometric images of 
the same part of the sky to search for bright galaxies preferen- 
tially in the middle of clusters and hence surrounded by a large 
number of less bright galaxies. This would provide additional 
targets for multi-object spectroscopy. 

One may combine multiple data sets in several ways. First 
one could compute the cross-correlation between the various 
data sets in each FoV. Alternatively, if one is simply trying to 
maximise the effective number of targets, one can extend Eq. 
([TJ by summing over the corresponding objects in each data set 
visible in that FoV. As an example, there may be only 3 objects 
visible in a given FoV in the first catalog but perhaps 10 are 
visible in the same FoV in the second catalog. 

Of course, the eventual aim is typically to minimise errors 
on an estimate of some relevant quantity or set of parameters. 
Hence the extra objects may not be worth while in the sense 
that they may be very faint and require a great deal more in- 
tegration time to acquire. This can be dealt with by making 
the weights, Wij depend on the brightness or redshifts of the 
objects. A totally integrated approach to this problem would 
make use of a framework such as the Integrated Parameter 
Space Optimisation (IPSO) formalism (Bassett 2005) which 
optimises the design of a survey by using a FoM that depends 
directly on the size of the final error bars on quantities of inter- 
est. Here we neglect this final step for simplicity. 

3. Finding the optimal targets 

Having defined our FoM, we now have a function defined, in 
the simplest cases, over the subset of the sky defined by the tar- 
get catalog. The aim is now to find the set of FoV that maximise 
the FoM while respecting the constraint (|2]i. This presents an 
unusual global optimisation problem. The aim is to find a group 
of directions for all the FoV which, taken together, maximise 
the FoM. 

In general this is a non-trivial problem since it is non-local. 
One possible approach is to use grid division: divide the target 
catalog on the sky into a lattice and compute the number of 
objects within each grid element. The best survey is then trivial: 
simply strategically place a FoV in each of the N most densly 
populated grid elements (assuming the FoV are small enough 
that neighbouring FoV do not overlap). 

This method is flawed in several ways, however. Firstly, the 
optimal number of grid elements to use is not obvious, it de- 
pends very much on the particular data set. A finer grid mesh 
is not always better It might be the case that for a particular 
grid size, a cluster small enough to fit into one grid element is 



shared evenly between a few grid elements. The over-density of 
this cluster will then not be as obvious as it would be if it were 
fortunate enough to lie only in one grid element. Secondly, the 
computation time is proportional to the number of grid ele- 
ments and usually proportional to the square of the number of 
objects and hence is unfeasible in general for very large data 
sets (which is the case we are mainly interested in). 

When the data are sparsely distributed a much more effi- 
cient approach is to center a FoV on each point and then com- 
pute the best survey that way. This again is inefficient for large 
data sets and suffers from the problem that the best FoV will 
rarely be centered on one of the data points. 

An improvement to the lattice approach is to use an adap- 
tive grid which iteratively refines the grid in the best areas. We 
did not persue this approach because of the added complexity. 
Instead we concentrated on heuristic and stochastic methods. 

3.1. A new search algorithm - HYBRID 

To address the problem of optimisation in general, and optimal 
target selection in particular, we have designed a new heuris- 
tic algorithm we call HYBRID, which combines elements of 
Simulated Annealing (SA) (Kirkpatrick et al. 1983), Markov- 
Chain Monte Carlo (MCMC) (Metropolis et al. 1953) and 
Particle Swarm Optimisation (PSO) heuristics. HYBRID is a 
stochastic search algorithm whose basic idea is to run m FoV 
simultaneously on the data set (where m is determined by the 
number of FoV to be observed). At each step in the simulation, 
information about the performance of each FoV (encoded in its 
own FoM, denoted FoM,) is shared among the m FoV and the 
information is used to guide the future dynamics of each FoV. 
In this sense both MCMC and S A are special cases of HYBRID 
in which no information is shared between different FoV. 

While this key idea in HYBRID can be implemented in 
many ways, we have chosen the following implementation. 
As with standard MCMC and SA implementations, each FoV 
moves randomly around the allowed region according to the 
law: 



(4) 



where is drawn from an appropriate multivariate probabil- 
ity distribution. Typically this is chosen to be a multivariate 
Gaussian with variance cr /. In the case of standard MCMC, cr ; 
is the same at all steps while for SA the acceptance probabil- 
ity of a bad step decreases monotonically with step number j 
although the variance crj is constant. Corresponding to each Xj 
there is a FoV with Figure of Merit (FoM) labelled FoM/ y, i.e. 
step j of the / - th FoV. The proposed new position Xy+i is ac- 
cepted with a probability governed by the Hastings-Metropolis 
prescription, i.e. with probability: 



P(x;+i|xy) = nnn(e''"'"'^''*'-^''^'-^\ 1) 



(5) 



In other words, if FoM,j+i > FoM,- ^ the step is always accepted 
(here we assume the aim is to maximise the FoM), otherwise 
the system accepts a worse step (FoM,- j+i <FoM,-j-) with a re- 
duced probability that depends on a which controls how lenient 
the algorithm is towards accepting worse steps. 
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For the HYBRID algorithm we split at any step j, the ith 
component of the vector variance ctj into the product: 

cr'(j)^cr'^xMj)xg(j). (6) 

In other words the variance of the probability distribution from 
which the step size for a particular FoV is drawn at each step 
depends on two functions, / and g as well as an overall constant 
normalisation cr*?, that can be different in the different dimen- 
sions of the space (we typically took it to be a fixed fraction of 
the input catalog size in each dimension). 

The function fi, which we call the penalty function, shares 
information spatially among the various FoV. The basic idea is 
simple: if a given FoV is doing very badly relative to the other 
m-l FoV, it is in a bad region and statistically speaking should 
take bigger steps to get to a better region. If the field of view 
is doing very well relative to the rest of the FoV, it should take 
very small steps and probe its immediate neighbourhood for 
even better configurations. 

This deals with the annoying habit of optimal target selec- 
tion that while one or more FoV may quickly find a rich clus- 
ter, they will typically wander off before the other FoV can find 
good regions and hence the final "optimal" survey will typi- 
cally be significantly sub-optimal. 

We let fi depend monotonically on the variable 
Piif) =FoM,(;)/<FoM(j)>, the ratio of the FoM of the /th 
FoV to the average FoM (both at step j). In our simulations 
we chose fipd to be linearly decreasing from = to = 1 
and decreasing as for > 1, typically with y = 2, and the 
matching condition /(p, = 1) = 1, as shown in Fig. ([T]i. 



maximum, namely one can make a rough estimate of the cur- 
vature of the extremumby histograming the recent chain values 
in each dimension. One can then choose / to be the standard 
deviation of this histogram in each dimension. Hence the char- 
acteristic step taken will be adapted to the local curvature of 
the extremum. A similar version to this was used in our tests of 
Griewangk's function in section ( 14.5b and greatly improved the 
final approach to the minimum. 
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Fig. 2. Path of one FoV (from left to right) across the gradi- 
ent-Hclusters data-set described in section (14.2b . At this scale 
the path appears to end abruptly two-thirds the way across the 
dataset. Fig. ([3) shows how the FoV converges rapidly onto a 
rich cluster. 




P 



Fig. 1. Various forms of the function / used in our simulations 
as a function of p =FoM,(7)/(FoM(y)). / enforces the require- 
ment that the FoV take small steps in rich areas (/?»!) and 
large steps in poor areas (/? « 1). 

When the FoM is continuous and p '» I (<K 1 for min- 
imisation) it can be very profitable to implement a transition in 
/. When /? » 1 the FoV is performing very well and hence 
one can be fairly confident of the FoV being near a local max- 
imum and hence one can adapt / to the natural scale of that 



A different approach in this situation is to switch to a purely 
deterministic method. When p is sufficiently large one can 
make a deterministic estimate of the gradient and use any of the 
standard gradient-based methods to converge to the extremum 
analytically, basically by approximating the FoM as quadratic 
near the minimum. In summary, the function / implements the 
spatial and parallel sharing of information as occurs in parti- 
cle swarm optimisation and ensures that good performers are 
rewarded while poor performers are aided. 

On the other hand, the function g is chosen to help the 
whole set of FoV settle into their local optima as time goes by, 
thereby helping to yield an optimal total survey. The function 
g represents a performance-dependent cooling of the average 
step size. 

In SA, the probability with which bad steps are accepted 
is reduced gradually and monotonically with time according to 
some cooling schedule, e.g. a oc (log(l + (Kirkpatrick 
et al. 1983). With a sufficiently slow cooling schedule one is 
assured of reaching the global optimum. However, this is slow. 
A faster cooling schedule may deliver better results in some 
cases but may also be trapped in glassy, local extrema - there is 
no assurance that the global optimum will be found. 

Instead of letting g depend directly on j, we again share in- 
formation between the m FoV. We let g depend on the variable 
q = <FoM(_/'))/(FoM(l)), i.e. the ratio of the average FoM at 
step j to that at step 1 . Hence, the evolution of the g(q) may be 
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the step size will automatically increase. In our simulations we 
chose: 



Fig. 3. Zoom of Fig. dU showing the final part of the trajec- 
tory of the FoV centre after locking onto a cluster On finding 
the cluster the average step size plummets by a factor ~ lO"-' 
allowing the FoV to fully explore the cluster With standard 
MCMC or SA methods the FoV would have evolved away 
from the cluster before exploring it. Cluster points are denoted 
by squares. On average, away from clusters, a region this size 
would contain less than one point. The radius of the FoV for 
this data set is 5 showing how the optimal FoV captures most 
of the points in the cluster. 
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Fig. 4. Behaviour of the FoM for the single FoV shown in Figs 
(|2|l and (O as well as the corresponding / and g as a function of 
step number j. Note how / varies rapidly, anti-correlated with 
the FoM while g slowly decreases so as to relax the full system 
of FoV towards the global optimum. At j ~ 150 the FoV comes 
across the cluster and explores it to j > 10"* illustrating how 
effective HYBRID is. The increase of / for j > 200 shows 
the improved performance of the remaining FoV in the survey 
which asymptotically match the performance of this FoV (since 
!)■ 



non-monotonic and there is no need for reannealing (increas- 
ing g by hand at some point) - if the survey is performing badly 



(7) 



with jS e [0.1, 1]. For large q the survey is performing well and 
g is small. 

Figures (|2l|3]llll show the evolution of a single FoV through 
the gradient dataset described in section ( I4.2l i which ended-up 
locking on to a cluster. The discovery of a rich cluster causes / 
to drop by a factor 10^ allowing the FoV to stick to the cluster 
and explore it thoroughly. Figure ^ shows that after a few 
hundred steps the FoM begins to level off. Notice, however, that 
thereafter, even though the FoM remains more or less constant, 
the magnitude of / increases by about a factor of 20 so that 
after 10000 steps it's back up to 1. The reason for this is that 
while the FoV is exploring the cluster, the other m - I FoV 
are themselves moving to progressively better regions. This has 
the effect of diminishing the performance of the FoV relative 
to the others and so / — > 1. The FoV should still scout the 
cluster even though its performance relative to the other FoV 
is deteriorating. In order to ensure this, the cooling schedule, 
controlled by g, should decrease the step sizes of all FoV. Thus, 
even though the FoV, after several hundred steps, is on a par 
with the other m - 1 FoV, all the FoV are taking smaller steps 
thanks to g. By handing over control of the FoV to g at later 
times, each FoV can still effectively explore optimal regions by 
being forced to take smaller step sizes. 

The third component to HYBRID is the acceptance ratio, 
a. In the standard Hastings-Metropolis method for MCMC, a 
jump to a worse position is accepted with probability exp(aA) 
where a - 1/2 and A = FoMy+i - FoM^ is the difference 
in FoM between the proposed new step and the current step. 
Clearly a plays a key role in determining how likely the sys- 
tem is to accept a bad step (which is important in escaping from 
local minima). 

In HYBRID we can assign a different value of a to each 
FoV and to depend on step j as well as allowing the resulting 
or; to depend on the information gathered by the system, as with 
the variance cry. The basic philosophy for this is as follows: if 
a FoV quickly finds a good region of targets we do not want 
it wandering off while the rest of the FoV search for greener 
pastures. One could ensure this by fixing a to be large, but then 
FoV would get stuck even in relatively poor regions and not be 
able to look for anything better. Conversely, if a is small, the 
FoV will leave excellent regions before the rest of the FoV find 
good regions. 

Therefore allowing a, like / and g to depend on the ratio 
p provides a dynamic way of helping FoV to "stick" to good 
clusters. Here "good" is defined relative to the average FoM 
of all the various FoV, i.e. by p. A simple choice therefore for 
the functional form of aiip) is to make it roughly go as f[^(p). 
However, in our simulations below we choose a constant for 
simplicity (except in the case of simulated annealing). 

The above choices for /, g and a are motivated by the re- 
quired asymptotics but are otherwise fairly arbitrary. It would 
be ideal to have a method of finding the converging automati- 
cally to appropriate functions for each data set, a problem left 
for future work. 
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3.2. Simulated Annealing 

Simulated annealing corresponds to the specific case f - g - 
1. True SA corresponds to the choice a oc (ln(l + y'))"', i.e. a 
logarithmic cooling schedule, where a is interpreted as the tem- 
perature of the system. This cooling ensures finding the global 
minimum in the infinite time limit. However, we also consider 
inverse power-law dependence on j which typically gives bet- 
ter results after a relatively small number of steps. 

The problem with SA is that the acceptance probability of 
bad steps can become extremely small early on in the simula- 
tion even when the system is trapped in a poor position. This 
typically means the system must be reannealed by resetting the 
cooling schedule T to some previous value. With HYBRID this 
is not necessary since the cooling schedule depends on step 
number j as well as the controlling function g and so reanneal- 
ing occurs automatically and only when necessary. 

3.3. Step cooling 

An alternative to SA is to keep a constant but to slowly cool 
the average size of the steps, i.e. to have g = g(j). We call 
this step cooling (SC) and have explicitly run simulations with 
a logarithmic step cooling schedule, i.e. g oc (ln(l + y'))"'. 
Performance on the test cases presented here was comparable 
to, but not quite as good, as standard SA although we do not 
include it in the figures for clarity. In both SA and step cool- 
ing the system ends up effectively trapped - in the former case 
because a smaller and smaller fraction of attempted jumps are 
successful and in the latter because the step size approaches 
zero. In all cases we tested HYBRID significantly outperforms 
both of these methods. 

3.4. Particle Swarm Optimisation 

Particle Swarm Optimisation (PSO) is a method modelled on 
natural swarms and herds found in nature that use a large num- 
ber of agents to efficiently search a volume for an optimal point. 
The dynamics of each individual in the swarm depends on the 
dynamics and success or failure of the other agents (see e.g. 
(Eberhart et al. 1995), (Kennedy et al. 2001)). For an applica- 
tion to astrophysics see (Skokos et al. 2005). In the context of 
HYBRID PSO can naturally be implemented by setting g = 1 
and allowing spatial "communication" between the FoV via 

There is an important difference between PSO as imple- 
mented in standard optimisation algorithms and the above im- 
plementation in HYBRID. One of the main aims of HYBRID is 
the study of the optimal target selection problem where the aim 
is for each agent in the swarm to find the best possible position 
for itself that does not overlap with other agents and which, 
taken together, give the best possible combined FoM. Hence, 
HYBRID does not use the positions of the FoV, only their indi- 
vidual performances. Of course, this can easily be generalised 
in the case of standard optimisation where the desire is to find 
a single optimum. 



3.5. MCMC 

Standard Markov-Chain Monte Carlo (MCMC) is governed by 
a further specialisation, namely / - I, g = 1. In this case 
the variance of the jump distribution is constant in time for all 
fields of view, as is the acceptance ratio a. Although MCMC 
is not typically used as an optimisation algorithm it is assured 
to find the global optimum in the long-time limit and hence 
provides a very simple, robust optimisation method. MCMC is 
the blind and deaf version of HYBRID and provides a baseline 
which allows us to understand how useful the proposed new 
functions /, g are. 



4. Results 

In this section we discuss the results of the various methods 
(HYBRID, MCMC and SA) against various simulated data- 
sets and functions. We will show how HYBRID wins over 
MCMC and SA in three ways: (1) The maximum average FoM 
achieved is always higher. (2) The convergence to the average 
maximum is faster and (3) the HYBRID method is much more 
consistent in delivering good results so the spread around the 
maximum is much smaller. Combined this yields significant 
improvements in performance over SA and MCMC. 

In all our discrete data simulations we take Wij - 1 and 
we therefore simply attempt to maximise the number of points 
lying within the 20 FoV which we use to define the compete 
survey. 

4.1. Clusters on a uniform background 

Our first test data set consists of uniformly distributed points 
with multiple clusters superimposed on the distribution. As 
such it is a crude mimic of typical data sets where optimal target 
selection is useful, i.e. finding very good regions embedded in 
random low-density areas. While we tried many different con- 
figurations the figures show results for 15000 points, two-thirds 
of which are uniformly distributed and one-third are contained 
in 100 clusters of characteristic radius of 0.2% the total survey 
size each containing 50 points. The radius of the (circular) 
FoV is taken to be 0. 1 % the size of the survey (half the charac- 
teristic radius of the clusters). 

The performance of the various methods is shown in Fig. 
^ with HYBRID outperforming MCMC and SA both in terms 
of maximum FoM and acceleration of the FoM with step num- 
ber The figure shows the FoM averaged over thousands of runs 
for each method. In addition we show the Icr error bars on the 
FoM at each step. Note how for ; > 4000 the HYBRID method 
has significantly smaller error bars showing that there is in- 
creased consistency as well as better performance on average. 



' For comparison we note that a typical galaxy cluster may subtend 
a few arcminutes on the sky at a redshift z < 1. Hence our clusters 
are about the appropriate size for simulating a cluster embedded in a 
survey covering a 50° x 50° region on the sky. 
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Fig. 5. Test dataset 1 : A uniform distribution of lO'* points with 
100 uniformly distributed dense clusters superimposed each 
containing 50 points. 



MCMC 




2000 



Simulated Annealing 



4000 6000 
j 



10000 



Fig. 6. Performance on the dataset shown in Fig. (|5]l of the var- 
ious methods as a function of step j and averaged over all 5000 
runs. The errorbars show Icr variation in the FoM over the 5000 




2000 



Fig. 7. Test dataset 2: Gradient background with 25000 points 
with 60 superimposed dense clusters each containing 50 points. 
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Fig. 8. Performance, on the gradient dataset in Fig (|7]l, of the 
various methods as a function of step j and averaged over all 
runs. The Icr errorbars shown computed from 5000 runs. 



4.2. Clusters on a gradient 

An extension of the first data set is to superimpose the clus- 
ters on a large gradient, shown in Fig. (|7]i. This dataset tests 
the algorithms abiUty to find large regions of high intensity as 
opposed to small clusters and serves as a further examination 
of the speed of convergence to optimal regions. 

The data-set used for our simulations contained 28000 
points in total with 60 embedded clusters containing 50 points 
each and with characteristic radius 0.5% the total survey size. 
The radius of the FoV is taken to be 0.25% the size of the sur- 
vey, again half the characteristic radius of the clusters. 

The HYBRID algorithm outperforms both SA and MCMC 
in the same way as before, leading to higher average FoM and 
to significantly smaller errorbars. This illustrates how success- 
ful the HYBRID algorithm is at sensing the overall gradient 
in the data set and responds accordingly: FoV in sparse areas 



are forced by the function / to take large steps which favour 
moving across the gradient. 

This extra consistency is highlighted in Fig. (|9]l which 
shows histograms of the FoM values at j - 5000 for both the 
HYBRID and MCMC (SA is similar to MCMC). It is clear 
that the HYBRID runs are significantly more clustered together 
than the MCMC runs are. In practise this means that fewer 
HYBRID simulations need to be run to get good results. 

4.3. Simulated galaxy data 

Our final discrete data set is a simulated galaxy clustering set 
allowing for an inhomogeneous input catalog. This is com- 
mon in astronomy when the sky has been incompletely sam- 
pled leaving regions with little or no exposure and others with 
deep exposures and large numbers of targets. In this case we 
have mimicked a scan strategy that fills in strips at constant 
depth leaving the stripy pattern seen in Fig. ( fTOt . The dataset 
contains 13500 points and is purposely of very different scale 
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Fig. 9. Histogram at step ; =5000 for the FoM for HYBRID 
and MCMC (S A is similar to MCMC and is not shown for clar- 
ity) on the gradient dataset of Fig Notice how the HYBRID 
runs are significantly more clustered than MCMC runs imply- 
ing much enhanced consistency: HYBRID relies much less on 
luck than SA or MCMC do. 



in the x and y directions. The radius of the FoV we use is 8 
arcmin (assuming the x and y dimensions are RA and DEC) 
appropriate to the Southern African Large Telescope. 
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Fig. 10. Dataset 3: Simulated galaxy catalog with 13500 points 
and an inhomogeneous density along the x-axis simulating a 
partially completed survey. 



In this case we imposed periodic boundary conditions and 
choose different cr° for i - x,y to allow for the elongated shape 
of the input space. As with the uniform and gradient datasets 
described above HYBRID outperforms both SA and MCMC. 
Although the improvement is not exponential the gains are sig- 
nificant and in the region of ~ 20%. We now consider cases in 
which the improvement provided by HYBRID is much more 
dramatic. 




10000 



Fig. 11. Performance for the dataset of Fig. ( fTOl l of the various 
methods as a function of step j and averaged over all runs. The 
Icr errorbars shown were computed from 5000 runs. 

4.4. 50-dimensional hyperboloid 

To demonstrate the power of HYBRID it is useful to consider 
optimisation of high-dimensionality continuous functions. In 
this case the aim is simply to find a path which converges 
as close as possible to the global extremum of the function. 
Our first continuous function is a standard optimisation test- 
function: the «-dimensional hyperboloid described by (we take 
n = 50): 



Ci(xi)^I.Zx^ 



(8) 



In this case the aim is to minimise the function Ci which clearly 
occurs at jc, - where Ci (0) = 0. In this case we quote the best 
performing path as the FoM rather than sum over all FoV0. 

The performance of the various methods as a function of 
step j and averaged over 5000 runs is shown in Fig. ( fT2] i. 
HYBRID succeeds for a simple reason: FoV that have moved 
significantly towards the minimum will have small / and hence 
will take small steps since they are outperforming the average 
FoV. Since they are taking small steps, the probability of tak- 
ing a step that improves the FoM increases. Conversely FoV 
doing badly try to take large steps. While this is an unsuccess- 
ful strategy, on average, it works well in rare cases, leading to 
significant improvements. Hence the system cascades down the 
slope with a mixture of large, high-risk steps and small, low- 
risk steps. 

In contrast the MCMC method is forced either to take large 
steps which fail most of the time due to the high dimensionality 
of the system or small steps which imply a huge amount of 
time to reach the minimum. SA interpolates between these two 
by smoothly decreasing the average step size but eventually is 
frozen in because the step size becomes vanishingly small. 

To give an idea of the performance improvements of the 
HYBRID algorithm, consider the best case FoM at j - 
2000 out of 5000 runs reached for each of the methods: the 
HYBRID, MCMC and SA algorithms respectively achieved 



- In the case of optimising continuous functions the notion of fields 
of view has little meaning. We are now simply computing the function 
at points on the hypersurface of interest. 
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Fig. 12. Mean performance averaged over 5000 runs as a func- 
tion of step j on the continuous 50-dimensional hyperboloid 
given by eq. ((H]). HYBRID completely outperforms both SA 
and MCMC since the function /j allows the FoV to cascade to- 
wards the minimum by taking large steps and then to properly 
explore the minimum by enforcing small steps. 

(0.03, 1006, 4.78 x lO'*). Beyond j = 5500 SA did improve sig- 
nificantly, surpassing the best MCMC result and reaching 1 .874 
at j - 10,000; an order of magnitude worse than HYBRID 
(1.9 1x10"^) and taking three times longer to reach that point. 
We do not show error bars on the points in Fig. ( fT2b since now 
the distribution of FoM is very non-Gaussian and cannot easily 
be represented on the plot since the minima in the HYBRID 
case can be below 10"'°. 

In summary HYBRID significantly outperforms both 
MCMC and SA in all areas: best performance, consistency and 
speed to reach a given threshold. 

4.5. Griewangk's test-function 

A second continuous example is provided by another standard 
test-bed for optimisation algorithms: Griewangk's function. We 
consider only the two dimensional case which nevertheless pro- 
cesses a very large number of local minima as can be seen in 
the slice through the plane X2 = shown in Fig. ( fTST l. The func- 
tion possesses a single global minimum at x* = ^2 = where 
C2(ji:*, Xj) = 0. Griewangk's function is defined as: 

^^^^^•>-4^^-'^5-n-"^°^(i)^l 

where the xj can range over the interval [-600,600]. We 
choose the starting values to be xi = X2 - 500 in all cases. 

Figure (fT4l i shows the performance of the HYBRID, SA 
and MCMC algorithms versus step number j, again showing 
the minimum FoM of the best Markov chain (rather than the av- 
erage FoM). It is clear that HYBRID completely outperforms 
both SA and MCMC. Although initially all methods lead to 
an exponential decrease in C2, HYBRID has the largest expo- 
nent. Further, both MCMC and SA stagnate rapidly. Instead 
HYBRID continues to decrease exponentially, albeit at a some- 
what slower rate, showing the power of the new algorithm. 




-100 -50 50 100 

(X2=0) 

Fig. 13. Slice through Griewangk's function at X2 = 0. Note 
the two length scales of variability governed by the large-scale 
quadratic term and the small-scale oscillatory term. 
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Fig. 14. Best performance over 5000 runs as a function of step 
/■ on Griewangk's function. HYBRID significantly outperforms 
the other methods, reaching values of order 10'° smaller than 
either simulated annealing or MCMC. Each data point is the 
minimum FoM achieved by any FoV out of all the runs at step 
j. For very small FoM, / is made to depend only on the FoM 
allowing rapid convergence to very small values of the func- 
tion. The best values achieved by SA or MCMC after 10^ steps 
were achieved by HYBRID after only ~ 500 steps. 



The power of HYBRID can be judged by the minimum val- 
ues of the function achieved. For very small FoM, one can take 
advantage of the flexibility in / to make it independent of the 
other FoV. For FoM < 10"^ we made / only a function of the 
FoM. With this modification of / we were able to consistently 
get below 10 ''^ within 30000 steps compared to the perfor- 
mance of the MCMC and SA cases which were very poor in 
comparison. 

HYBRID outperforms SA and MCMC despite the function 
only being two-dimensional for a simple reason: there are two 
curvature scales in the Griewangk function: the function varies 
strongly as x varies by ~ 10 as one passes through the many 
local minima. On the other hand there is an overall average 
parabolic shape visible as x varies over ~ 100. HYBRID has 
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the flexibility to deal with multiple curvature scales while stan- 
dard methods can adapt to only one of the scales leading to 
poor performance overall. 

5. Conclusions 

The problem of optimal target selection will be key in maxi- 
mally extracting value from the large data sets that will charac- 
terise many sciences in the coming decades. From astronomy 
to genetics there is a need for a formalism that can optimally 
select targets for a specific experiment using a specific instru- 
ment. In extragalatic astronomy and cosmology there will be 
significant pressure to select optimal subsets of imaging data 
for spectroscopic foUowup driving the development of eflicient 
algorithms for this purpose. 

In this paper we present a new optimisation algorithm, 
HYBRID, and compare its performance on simulated data us- 
ing two standard stochastic algorithms: simulated annealing 
(SA) and Markov-Chain Monte Carlo (MCMC). The key ad- 
vance in HYBRID is the idea to parallelise the search, sharing 
information between fields of view ('agents') at each step in the 
search so that each field of view has knowledge of how well it 
is performing relative to the ensemble of fields of view. As a re- 
sult the average step size taken by a field of view can be adapted 
to its own performance. In this sense HYBRID is a combina- 
tion of S A, MCMC and Particle Swarm Optimisation. We show 
how HYBRID outperforms both SA and MCMC in all cases 
and is particularly eflicient in the cases where there are multi- 
ple clustering scales in the data and/or the target space has high 
dimensionality. In the case of the minimisation of known test 
functions, HYBRID significantly outperforms all other meth- 
ods by up to ten orders of magnitude, indicating that HYBRID 
will be useful far beyond the specific problem of optimal target 
selection. 

Future work will apply HYBRID to target selection for the 
new 10m SALT telescope in South Africa. 

We acknowledge use of the UCT Physics cluster. Carmen, 
for some of the simulations in this paper. 
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